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ABSTRACT 

This study investigates the phylogenetic distribution of homol¬ 
ogy to Dicathais orbita hypobranchial gland genes based on 
tBLASTx pairwise sequence alignments from the Gcnbank 
database. Suppressive subtractive hybridization was used to 
obtain 417 non-redundant genes that were up-regulatcd or 
uniquely expressed in the hypobranchial gland relative to man¬ 
tle tissue. Of these, 133 sequences revealed matches to the 
database with the remaining 68% of genes appearing as appar¬ 
ently novel sequences. Homologous sequence matches were 
observed for a wide range of evolutionarily divergent taxa, 
encompassing animals, protozoans, plants, fungi, bacteria, and 
viruses. The highest frequency of homology was found towards 
chordate sequences, followed by the Mollusca, which high¬ 
lights the current bias in availability of vertebrate versus inver¬ 
tebrate sequences in the database, An unexpectedly high 
proportion of matches were also found toward the Ciliophora, 
indicating a possible symbiotic relationship, as well as the 
Ascomycota and Streptophyta, which share the ability to bio¬ 
synthesize indole derivatives with Muricidae such as Dicathais 
orbita. Overall, these results reveal the usefulness of undertak¬ 
ing sequence comparisons in gene expression and highlight the 
current paucity of knowledge of molluscan genomes. 
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INTRODUCTION 

The development of genomic technologies has had a 
dramatic effect on all fields of biological sciences (Col¬ 
lins et ah, 2003). Since the completion of the human 
genome project in 2003 (Collins ct al. 2003), the number 
of genomes available has grown dramatically. As of 
November 2007, a total of 426 eukaryotic (24 complete. 
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164 undergoing assembly and 238 in progress) and 599 
bacterial genomes were available on the Genbank data¬ 
base (NCBI, 2007). The increased number of genomes 
available enhances our understanding of the biology of 
the species in question and provides a basis for compar¬ 
ative studies in functional biology. Despite this increase 
in data, trends in comparative genomics favor the analy¬ 
sis of mammalian sequences (Barnes et al. 2004), and 
often the homologous identification and classification of 
non-vertebrate sequences is more challenging. 

The Mollusca has been identified as the second most 
diverse and speciose phylum in the animal kingdom, 
with members present in marine, freshwater, and terres¬ 
trial environments (Pechenik, 2000), Despite their abun¬ 
dance and the economic importance of many species 
(Beesley et al., 1998), the genome ol mollusks remains 
relatively uncharted. So far, the complete genome has 
only been sequenced for the Californian sea hare Aply - 
sia californica Cooper, 1863, and this is yet to be anno¬ 
tated (NCBI, 2007). The bivalves Argopecten irradians 
(Lamarck, 1819), Crassostrea virginica (Gmelin, 1791), 
and Spisula solidissima (Dillwyn, 1817), which are all 
important fisheries resources, and the medically impor¬ 
tant freshwater snail Biomphalaria glabrata, are current¬ 
ly undergoing sequencing (NCBI, 2007). Nevertheless, a 
major hurdle in molluscan genomics lies in defining the 
functions of sequences identified. Sequence homolog)' 
has been used heavily to assign functions in mammalian 
genomes. However, the lack of currently available mol¬ 
luscan and invertebrate sequences limits the ability to 
assign gene function using comparative classifications 
drawn from existing invertebrate genomic sequence in¬ 
formation. Nevertheless, broader comparisons to more 
distantly related organisms could yield novel information 
about well conserved genes or genes that have indepen¬ 
dently evolved convergent functions in distinct taxa. 

The hypobranchial gland of neogastropods is a unique¬ 
ly molluscan organ (Beesley ct al., 1998) of uncertain 








P. W. Laffy et al., 2009 


Page 155 


origin and function (Westley et al., 2006). Within the 
family Muricidae, it is the well known source of the 
ancient dye Tyrian purple (Baker, 1974; Cooksey, 2001). 
Tynan puiple is generated by a series of chemical reac¬ 
tions from indoxyl sulphate precursors that are bromi- 
nated secondary metabolites thought to be derived from 
the amino acid tryptophan (Westley et al., 2006). While 
the Muricidae are thought to be the only source of the 
purple brominated dye, the related blue dye indigo is 
produced by a number of other taxa including plants, 
bacteria and fungi (Epstein et ah, 1969; Meijer et al., 
2006; Mayser et al., 2007). This presents an interesting 
case of apparent convergent evolution in biosynthetic 
capabilities. 

Basic Local Alignment Search Tool (BLAST) analysis 
is a key tool used to identify orthologous genes from 
different organisms, and its use has been instrumental 
in classifying countless sequences (Galagan et al., 2003; 
Venter et al., 2001). The taxonomic classifications of high 
scoring BLAST matches with unclassified sequences are 
useful in identifying sequences with specific or variable 
functions and may indicate key gaps in the current se¬ 
quence data for members of specific phyla. This study 
results from a larger project that is currently underway 
to identify the genes expressed in the hypobranehial 
gland of Dicathais orbita (Gmelin, 1791), a predatory 
marine gastropod belonging to the family Muricidae, 
order Neogastropoda. Here we report on our tBLASTx 
analysis, where sequences were translated into all possi¬ 
ble protein translations and compared to all possible 
translations of eveiy nucleotides sequence in Genbank, 
to observe trends in molluscan sequence similarity and 
assess the proportion of homologous genes expressed in 
this unique biosynthetic organ. 


MATERIALS AND METHODS 

A suppressive subtractive hybridization (SSH) (Dia- 
tchenko et al., 1999) eDNA library containing the up- 
regulated and differentially expressed genes within the 
hypobranehial gland of D. orbita , when compared to 
mantle tissue gene expression, was created using a Clon- 
tech PCR-Select rM cDNA Subtraction Kit (Clontech, 
California, USA). The RNaqueous® RNA extraction kit 
(Ambion, Texas, USA), TRI Reagent® (Ambion) and 
DNasel (Invitrogen, CA, USA) digestion were used to 
obtain RNA from the hypobranehial glands and mantle 
of two D. orbita specimens. The subtraction was per¬ 
formed utilizing pooled hypobranehial gland transcripts 
as the tester population and pooled mantle transcripts as 
the driver population. Subtracted cDNA produced from 
SSH were cloned into pGEM®-T Easy vector (Pro- 
mega, Wisconsin, USA). Colonies with inserts were se¬ 
lected, plasmid DNA was purified and sequencing was 
performed by Southpath and Flinders Sequencing 
Facility (Adelaide, Australia) or Australian Genome Re¬ 
search Facility (AGRF sequencing, Brisbane, Australia). 
A total of 554 plasmids were sequenced, and vector 


sequence and adaptor regions were removed. Contigs 
were formed using Sequencher Version 4.1.4 yielding a 
non-redundant set of expressed sequence tags (ESTs) 
differentially expressed in the hypobranehial gland of D. 
orbita. In total, 417 unique resulting sequences were 
submitted to tBLASTx analysis and the highest scoring 
matches for all sequences with an e value smaller than 
le -5 were collated. The phylum of the orthologous se¬ 
quence was recorded, and in cases where the highest 
scoring tBLASTx matched a molluscan sequence, the 
class was determined. In cases where the matching se¬ 
quence belonged to a member of the class Gastropoda, 
the family was also recorded. The total number of Gen¬ 
bank sequences for phyla with 5 or more sequence 
matches was recorded (Figure 1). 


RESULTS 

A total of 133 sequences out of 417 (31.9%) resulted in 
significant tBLASTx matches, with 23 different phyla 
represented from the best scoring blast match for each 
identified sequence. Seven of these phyla had matches 
to 5 or more hypobranehial gland sequences from D. 
orbita. The Chordata showed the highest number of 
matches, with 32 homologous sequences identified, close¬ 
ly followed by the Mollusca with 31 matches (Figure 1). 
Ciliophora were the third most abundant phylum with 15 
matches, followed by the invertebrate phyla Arthropoda 
and Echinodermata, with 12 and 8 sequences identified, 
respectively (Figure 1). There were seven Ascomycota 
homologs identified in D. orbit as hypobranehial gland, 
as well as five from the Streptophyta (Figure 1). 

Of the 31 molluscan sequence matches identified, 22 
sequences matched gastropod sequences. Twelve of 
these gastropod sequence homologs belonged to other 
members of the Muricidae family (Figure 1). Further 
distribution of the sequence homology is detailed in 
Figure 1. 


DISCUSSION 

While 133 of the sequences produced had BLAST 
matches that indicated the function of the transcripts, 
the remaining 284 genes sequenced from the hypobran- 
chial gland of Dicathais orbita appear to be novel, high¬ 
lighting the limited information currently available on 
molluscan genomes. The high frequency of matches 
to chordate sequences is likely to be due to the large 
abundance of vertebrate sequences in the public data¬ 
base (Barnes et al., 2004) (Table 1).There are currently 
over 57 million gene sequences from the Chordata, 
compared to less than 600,000 molluscan sequences 
available (Table 1), There is clearly a bias towards a high 
proportion of tBLASTx matches returning matches to 
human and other chordate sequences, which have over 
90 times the number of molluscan genes available for 
sequence alignment. 
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Figure 1. Phyla represented by highest scoring tBLASTx matches of genes expressed in the hypobranchial gland of Dicathais 
orbit a. A total of 417 non-redundant EST sequences were analysed using tBLASTx and the resulting 133 significant matches (E 
value < 10 5 ) were placed into 23 categories, based on the phylum grouping of the highest-scoring tBLASTx matches. Sequences 
grouped in the phylum Mollusca were further classified into the corresponding Class of the best tBLASTx match. Gastropod 
sequences were further divided according to family of the highest scoring tBLASTx matches. 


The abundance of matehes to sequences from the 
Ciliophora was unexpected, particularly since the number 
of eiliate sequences in publie databases is just over 
300,000 (Table 1). It is possible these protozoan gene 
matches actually result from eiliate genomes derived from 
endosymbionts occurring within the hypobranchial gland 
of D. orbita. Ciliates are ubiquitous protists that com¬ 
monly form relationships with other speeies, such as the 
parasitic Ichthijopthitins multifilius (Abernadiy et al., 
2007) and the symbiotie Enplotes uncinatus (Lobban 
et al., 2005). 

The abundance of matches to arthropod species was not 
unexpected due to the shared ancestral relationship be¬ 
tween the Mollusea and Arthropoda. However, the numer¬ 
ous matches to Echinodermata are less expeeted given 
that this phyla occurs on the deuterostome lineage along 
with chordates, which diverged from the mollusks and 
other protostomes over 100 million years ago (Heckman 
et al., 2001). Notably, there were relatively few matches to 
the Annelida (Figure 1) despite the fact that this abundant 
protostomc phylum occurs within the Lophotrochozoan 
lineage alongside the Mollusca, whieh form a separate 
clade from the Ecdyzoa, including arthropods and nema¬ 


todes (Aguinaldo et al., 1997). It is likely that the small 
number of annelid sequences available, less than 35, 000 
(Table 1), contributed to the small incidence of annelid 
sequence homology with our mollusean sequences. 
This further highlights the relatively limited genetic infor¬ 
mation that is available for so ealled “primitive” inverte¬ 
brate phyla. 

Table L Number of nucleotide sequences available on 
Gcnbank database for different phyla as published on the 18 
December 2007. All data was compiled as published under the 
Taxonomy browser available on NCBI Entrez taxonomy home 
page http://www.nebi.nlm. nib.gov/sites/cntrez?db=Taxonomy. 


Phylum 

Genbank nucleotide sequences 

Chordata 

57,495,211 

Mollusca 

599,894 

Ciliophora 

303,304 

Arthropoda 

5,090,469 

Echinodermata 

941,561 

Ascomycota 

1,367,029 

Strcptophyta 

22,413,399 

Annelida 

34,245 
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The frequency of sequence matches to the fungal 
Ascomycota and the plant Streptophyta was an unex¬ 
pected finding. This is possibly related to the fact that 
members of both the Streptophyta and Ascomycota are 
capable of similar secondary metabolite production as 
is the muricid Dicathais orbita. Indigo is produced in 
Isatis tineforia (phylum Streptophyta) (Epstein ct ah, 
1967), and the production of indole compounds has 
been reported for Candida glabrata (phylum Ascomy¬ 
cota) (Mayser et ah, 2007). These compounds are in the 
same chemical class of indole alkaloids as Tyrian purple, 
the brominated derivative of indigo secreted only from 
the hypobranchial gland of the Muricidae (Cooksey, 
2001; Westley et ah, 2006). These similarities in second¬ 
ary metabolite production may influence the frequency 
of homology with genes expressed in the hypobranchial 
gland of D. orbita. Further analysis of the conserved 
genes could help reveal some key biosynthetic enzymes 
and/or processes. As SSH allows for amplification of 
only up-regulated or uniquely expressed genes in this 
instance, we would expect sequences involved in chem¬ 
ical and protein biosynthesis to be amplified. This dem¬ 
onstrates that it is important to consider the source of 
expressed genes when interpreting sequence homology. 

Another key observation is the frequency and varia¬ 
tion of molluscan gene matches observed from our 
tBLASTx analysis. As mentioned, a total of 31 molluscan 
sequence matches were identified, with 22 gastropod 
sequences, 12 of which belonged to the family Murici¬ 
dae (Figure 1). This trend is expected as species within 
the same family are expected to show greater homology 
with our D. orbita sequences. The key limiting factor to 
the number of muricid and gastropod sequence matches 
is the limited amount of sequencing that has been per¬ 
formed on these groups, only 1994 Muricidae sequences 
have been published on the NCBI database as of 
November 2007 (NCBI 2007). The majority of 
sequences available for muricids are highly conserved 
genes involved in phylogenetic analysis such as ribosom- 
al RNA (Colgan et ah, 2007; Harasewych et ah, 1997; 
Oliverio and Mariottini, 2001), cytochrome oxidase 1 
(Colgan et al. 2007; Harasewych, et ah, 1997) and his¬ 
tone H3 sequences (Colgan et ah, 2007). The frequency 
of positive matches to D. orbita hypobranchial gland 
genes is likely to increase as a broader range of se¬ 
quences from additional Muricidae and other gastropod 
species are made available on Gcnbank. 

From tBLASTx analysis, we have identified the phylo¬ 
genetic distribution of species that share homology with 
Dicathais orbita gene sequences. While less than 32% 
of sequences could be positively matched on the gene 
databases, 31 matches were found encompassing species 
from both invertebrates and vertebrates within the Ani¬ 
mal Kingdom, as well as eukaryotic plants, protozoans, 
fungi, some prokaryotes and even viruses. Most matches 
pertain to chord ate sequences, and this may be attribu¬ 
ted to the abundance of these sequences within data¬ 
bases. Nevertheless, many of the sequences match 
other mollusean species and other invertebrate phyla, 


likely due the close evolutionary relationships leading to 
conserved genes. A significant proportion of sequences 
belong to ciliate protozoans, and it is unclear whether 
this is due to similarities between these protists and D. 
orbita or the addition of ciliate genes within our hypo¬ 
branchial gland expressed genes. The limited number of 
molluscan gene matches from our dataset supports the 
need for a larger number of molluscan sequences to be 
identified and released, encompassing a broader range 
of functional genes. Only then will we be able to accu¬ 
rately view trends in gene expression within the hypo¬ 
branchial gland of D. orbita. 
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